Using Bayesian Inference in Classical Hypothesis Testing

نویسنده

  • Milo Schield
چکیده

When students obtain a statistically significant sample at a 5% level of significance, they may conclude they can be 95% confident that the alternate hypothesis is true. From a classical perspective, this conclusion is unwarranted. Two elements may inadvertently support this unwarranted conclusion: the traditional definitions of Type I error and alpha, and the silence about confidence. From a Bayesian perspective, this conclusion might be warranted. Bayes rule can relate the Bayesian and classical probabilities of Type I error if classical hypotheses are treated as point masses and if one can treat degrees of belief about the truth of a state of nature as a probability. If the truth of the null and alternate are equally likely, if β = α, and if the sample statistic triggers a rejection of the null, then the Bayesian probability of Type I error is numerically equal to alpha and the Bayesian probability the alternate is true equals 1 α. The Bayesian probability of Type I error increases as the alternate becomes more improbable for a given level of alpha. Using this technique to select alpha and to interpret p-values may improve understanding of classical tests and decrease statistical opportunism. INTRODUCTION Students in introductory statistics are usually introduced to the classical fixed-level hypothesis test. Given an alpha of 5% and a statistically significant random sample they may conclude they can be 95% confident that the alternate hypothesis is true. I. CLASSICAL EVALUATION From a classical perspective, this conclusion is unwarranted and in error. However, there are two aspects of the classical approach that might encourage this error: the traditional definitions of Type I error and alpha, and the silence about confidence in hypothesis testing. A. DEFINITIONS OF ALPHA & TYPE I ERROR In presenting the classical hypothesis test, alpha is traditionally defined as the probability of Type I error. Type I error is often illustrated as being an intersection of two conditions as illustrated in Table 1. Table 1: Figurative Description of Hypothesis Testing CELLS ----STATE OF NATURE ---DECISION null is true null is false Fail to reject null OK outcome Type II error Reject null Type I error OK outcome Students may think as follows. On the one hand, one might take 50 samples from the Null distribution and perhaps 2 of them fall in the reject region. On the other hand, one might take 50 samples from the alternate distribution and perhaps 38 of them fall in the reject region. Thus, students might create the following table. Table 2: Table of hypothetical counts COUNT STATE OF NATURE Null is true Null is false Total Fail to reject 48 12 60 Reject 2 38 40 Total 50 50 100 Now there may be some errors in this. First, in reality either the null is true or it is not. In reality, you cannot have counts in both columns. Second, alpha is the criteria by which the rejection region was defined – prior to sampling. Alpha is not obtained by sampling – except in the limit of large numbers. These relative frequencies are just estimates of alpha. Problem: But, given these counts and the aforementioned definitions of alpha and Type I error, some students would say that alpha is a table percentage since Type I error is a single cell. Given this data, those students might estimate alpha as 2% (2/100). Some students may calculate alpha as a row percentage since rejecting the null is a necessary condition for Type I error. Given this data, these students would estimate alpha as 5% (2/40). Actually, alpha is a column percentage since alpha = P(null is rejected | null is true). In this case, alpha is properly estimated as 4% (2/50). Unfortunately, the traditional definition of alpha may prevent students from seeing alpha as a column percent. 10/01/96 Section on Statistical Education JSM-96 96ASA.doc Page 2 Milo Schield Traditional Solution: To avoid this problem, most authors traditionally define – not just describe – Type I error as being conditional on a columnbased process. Typical examples include: • Type I error: rejecting HO when HO is true • Type I error: rejecting a null hypothesis that is true • Type I error: rejecting HO given that HO is true. • Type I error: occurs if HO is rejected when it is true The appendix contains examples of conditional definitions taken from Smith, Kitchens, Iman, Moore, Moore and McCabe, Hogg and Tanis, Neter, Wasserman and Whitmore, and by Mendenhall, Sheaffer and Wackerly. Note that in all of these conditional definitions of Type I error, the stipulated condition is that the null is true. Thus, these traditional definitions imply that Type I error occurs only within a column-based process – a classical test of significance. By making Type I error meaningful only if one is sampling from the null, then – and only then – is the short-form statement of alpha (the probability of Type I error) a proper definition. Disadvantages The traditional approach has several disadvantages. It requires students to ignore the simple definition of Type I error as an intersection that is readily illustrated by means of a 2x2 table. It requires students to consider Type I error as being conditional and thus meaningful only within a test of significance. It requires students to consider alpha as being conditional without using the normal keywords for conditionality. This processoriented, conditional definition of Type I error is extremely subtle and very easy to misinterpret. Explanation Given these disadvantages, why do authors traditionally present alpha as being unconditional and Type I error as being conditional on a column-based process? Authors may be using these unusual definitions to introduce a hidden premise: Bayesian conditional probabilities (a row-based process) are meaningless in a hypothesis test involving a state of nature. Since the hypotheses are about a state of nature and since a state of nature is, in fact, either true or false, they argue it follows that a Bayesian row probability of Type I error is meaningless since is it either 0 or 1. Once Type I error is defined as a column-based process (rejecting the null given the null is true), then one cannot use this concept in any row-based process (calculating a Bayesian probability of Type I error). R. A. Fisher regarded probability theorems involving “psychological tendencies” (Bayesian reasoning) as “useless for scientific purposes” (Fisher, 1947). One wonders if the conditional wording of Type I error reflects this conviction. Authors who define Type I error conditionally may simply be following his practice without intending any claim about Bayesian inference. This process-based definition makes highly assertive and highly disputable claims about the epistemological status of probability. This claim is much stronger than saying that a classical test of significance simply ignores predictions of the Bayesian probability of error. This hidden premise attempts to reduce knowledge from being contextual to being intrinsic. To understand the contextual intrinsic distinction, consider having tossed a fair coin in a situation where no one yet knows the true state of nature about this coin. In reality (metaphysically or intrinsically), the probability of heads is either 0 or 1; but in our minds (epistemologically or contextually) we do not know this reality. So to us the state of the coin is still a random variable with a probability of 0.5 of being heads. Students, in placing counts in both columns in Table 2, are acting as though the state of nature is like the state of this coin: determined in reality but uncertain in our context of knowledge. Their action is consistent with the view that knowledge is contextual – not intrinsic. In summary, the traditional column-based definition of Type I error hides a highly disputable assertion about the nature of probability. By enclosing this assertion in a definition, students, teachers and even authors may have difficulty recognizing that a highly disputable philosophical argument is being made. Teachers may bypass this problem by referring to alpha as P(reject null | null is true). But this correct outcome hides the difficulties in using the traditional definitions of alpha and Type I error. Bayesians avoid this problem by simply avoiding the use of Type I error in determining P(null is true | null is rejected). Recommendation Authors and teachers should abandon the traditional definitions and use definitions that are more general: • Define Type I error as an intersection of two logically coequal conditions: Type I error occurs whenever the null is true and the null is rejected. • Define alpha conditionally as a column-based process: alpha is the probability of Type I error if the sample is drawn from the null distribution. This definition of Type I error makes it descriptively neutral rather than being disputably assertive (See Kelley’s review of definitions in The Art of Reasoning.) 10/01/96 Section on Statistical Education JSM-96 96ASA.doc Page 3 Milo Schield This general approach has several advantages. It presents Type I error simply as a single cell in a 2x2 table. It makes explicit the conditional nature of alpha: P(null will be rejected | null is true). It permits Bayesians to talk about the probability of Type I error given the null is rejected. Most importantly, in terms of Table 1, it should decrease the chance of mistaking alpha (a column percentage) for the Bayesian probability of Type I error (a row percentage). B. SILENCE ABOUT CONFIDENCE Within a classical approach, confidence is never mentioned in discussing hypothesis testing. The traditional explanation is that a particular hypothesis describes a state of nature. As such, the hypothesis is either true or false. One has no choice about which distribution one samples from. Among the statistically significant samples, either all or none will result in Type I error. But suppose students are interested in confidence. In confidence intervals, students were told there is a complementary relation between alpha (the probability of error) and confidence level. This may generate certain expectations in hypothesis testing. And since most texts and teachers are resolutely silent about confidence in hypothesis testing, students presume confidence applies to what they are interested in as decision makers the confidence that a decision is correct. Thus, they conclude that an alpha of 5% means they can be 95% confident that a decision to reject the null is correct. The solution to the problem of silence is to be explicit about the inability of the classical approach to speak of confidence, to present the Bayesian approach and then to present the strengths and weaknesses of each approach. For as Berger (1980) concluded “most such users (and probably the overwhelming majority) interpret classical measures in the direct probabilistic [Bayesian] sense. (Indeed the only way we have had even moderate success, in teaching elementary statistics students that an error probability is not a probability of a hypothesis, is to teach enough Bayesian analysis to be able to demonstrate the difference with examples.)”. II. BAYESIAN JUSTIFICATION From a Bayesian perspective, one can evaluate the Bayesian probability of Type I error associated with a classical hypothesis test by following a three step process. The first step involves the use of Bayes rule in comparing the quality and predictive power of medical tests. This step is not controversial so long as each subject can be either diseased or disease free. The Bayesian approach to medical tests is featured by Ellisor and Morrel in Statistics for Blood Bankers. The Bayesian approach to acceptance testing is presented by Moore and McCabe in Introduction to the Practice of Statistics and by Neter, Wasserman and Whitmore in Applied Statistics. The Bayesian approach to medical tests and acceptance testing is reviewed at length by Hamburg in Statistical Analysis for Decision Making. Step 1: Evaluating Medical Tests on Individuals Medical tests on individuals can be evaluated using a 2x2 table involving two contradictory states of disease for each individual and two test outcomes. Table 3: the four cells in a 2x2 table: CELLS DISEASE STATUS TEST RESULT Disease-free Diseased Negative OK outcome Type II Error Positive Type I Error OK Outcome In Table, 4, the following row probabilities are used: • δ is the Bayesian probability the subject is disease-free given that the test is positive (Type I error) • ε is the Bayesian probability the subject is diseased given the test is negative (Type II error). • γ is the prior probability that the subject is diseased Let δ′ = 1 δ, ε′ = 1 ε, and γ′ = 1 γ. • δ′ is the Bayesian probability that the subject is diseased given that the test is positive. This is called the Positive Predictive Value (PPV). • ε′ is the Bayesian probability that the subject is not diseased given the test is negative. This is called the Negative Predictive Value (NPV). PPV and NPV are used by Kolins in Statistics for Blood Bankers edited by Ellisor and Morel (1983). Kolins references Galen and Gambino (1975) Beyond Normality John Wiley and Sons as a primary source.] Table 4: the quality of a prediction (row percents) ROW % DISEASE STATUS Outcome Disease Free Diseased Negative ε′ = 1 ε ε 1 Positive δ δ′ = 1 δ 1 Incidence γ′ = 1 γ γ 1 Table 5 illustrates the column probabilities associated with sensitivity and specificity in medical tests. The symbol α is used to identify the probability of a positive test among those who are disease free. At this point this alpha has no relation to the alpha used in classical hypothesis testing. But this choice foreshadows what will come. 10/01/96 Section on Statistical Education JSM-96 96ASA.doc Page 4 Milo Schield Table 5: the quality of a test (column percents) COL % DISEASE STATUS TEST RESULTS Disease Free Diseased Negative specificity (α′) β Positive α sensitivity (β′) TOTAL 1 1 When α, β and γ are known, we can generate the counts in a 2x2 table for any test involving N subjects. Note that α′ = 1-α, β′ = 1-β, and γ′ = 1-γ. Table 6: the counts for each cell: COUNT SUBJECT STATUS Disease-free Diseased Negative α′ γ′ N β γ N (α′γ′ + βγ)N Positive α γ′ N β′ γ N (αγ′ + β′γ)N Prevalence γ′ N γ N N ROW VERSUS COLUMN PROBABILITIES Row probabilities can be generated given column probabilities using counts in Table 6 or by using Bayes rule: δ = P(Positive & Disease Free) / P(Positive). δ = αγ′ / (αγ′ + β′γ) Eq. 1a ε = βγ / (α′γ′ + βγ) Eq. 1b Column probabilities can be generated given row probabilities by solving 1a and 1b for alpha and beta: α = δγβ′ / γ′δ′ Eq. 2a β = εγ′α′ / γ ε′ Eq. 2b In summary, γ is the probability that a random patient has the disease – prior to (before) the test. If the patient tests positive, then δ′ is the revised probability the patient is diseased – posterior to (after) the test. The symbols δ and ε are reversed from those used in Ellisor and Morel’s Statistics for Blood Bankers. This reversal links the alphabetic sequence (δ and ε) with Type I and 2 errors respectively just like with α and β. Step 2: Reducing Continuous Hypothesis The second step in evaluating the Bayesian probability of Type I error in a classical hypothesis test is to reduce a continuous quantitative variable to a point mass. Reducing a null hypothesis from a range (HO: μ ≤ μO) to a point mass (HO: μ = μO) is standard procedure within the classical approach. Specifically, the point mass is situated so as to maximize the associated error. Step 3: Using Degrees of Belief and States of Nature The third step is to give prior probabilities about states of nature based on degrees of belief (Bayesian) the same epistemic status as prior probabilities about individual subjects based on relative frequencies (frequentist). This entails treating the existence of both null and alternate as simultaneously possible in thought even though in reality only one is true. It means treating Table 1 as being conceptually similar to Table 2. For more on this very important and highly disputable step, read Scientific Reasoning by Howson and Urbach. Summary: If we allow degrees of belief about a state of nature, then the dichotomous model used in medical testing can encompass classical hypothesis tests involving a quantitative variable. The Bayesian approach to classical hypothesis testing is mentioned in Statistical Reasoning by Smith. It is discussed in Statistical Decision Theory and Bayesian Analysis by Berger and in Statistical Analysis for Decision Making by Hamburg. General Case With a fixed sample size and specific values for alpha and beta, the Bayesian probability of Type I error can be deduced using Eq. 1a for any given value of the prior probability. As the alternate becomes more unlikely, alpha must decrease for a fixed level of Bayesian confidence. As Hamburg (1983) noted: “Prior knowledge concerning the likelihood of truth of the competing hypotheses also helps the investigator in establishing the significance level. Hence, if it is considered likely that the null hypothesis is true, we will tend to set α at a very low figure in order to maintain a low probability of erroneously rejecting that hypothesis.” Beta equals Alpha If Ho: μ ≤ μo, HA: μ > μ1 and μ1 > μo, then by varying the separation (μ1 μo) or the sample size (n), one can obtain β = α for any value of alpha. If β = α, then δ = αγ′ / (αγ′ + α′γ) Eq. 3a α = δγ / (γ′δ′ + δγ) Eq. 3b These relations are illustrated in Figures 1 and 2. Figure 1: δ as a function of γ for a fixed value of α. 0.5 0.4 0.3 0.2 0.1 0.0 0.5

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classical and Bayesian Inference in Two Parameter Exponential Distribution with Randomly Censored Data

Abstract. This paper deals with the classical and Bayesian estimation for two parameter exponential distribution having scale and location parameters with randomly censored data. The censoring time is also assumed to follow a two parameter exponential distribution with different scale but same location parameter. The main stress is on the location parameter in this paper. This parameter has not...

متن کامل

Cost Analysis of Acceptance Sampling Models Using Dynamic Programming and Bayesian Inference Considering Inspection Errors

Acceptance Sampling models have been widely applied in companies for the inspection and testing the raw material as well as the final products. A number of lots of the items are produced in a day in the industries so it may be impossible to inspect/test each item in a lot. The acceptance sampling models only provide the guarantee for the producer and consumer that the items in the lots are acco...

متن کامل

Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications

Bayesian parameter estimation and Bayesian hypothesis testing present attractive alternatives to classical inference using confidence intervals and p values. In part I of this series we outline ten prominent advantages of the Bayesian approach. Many of these advantages translate to concrete opportunities for pragmatic researchers. For instance, Bayesian hypothesis testing allows researchers to ...

متن کامل

Bayesian Fuzzy Hypothesis Testing with Imprecise Prior Distribution

This paper considers the testing of fuzzy hypotheses on the basis of a Bayesian approach. For this, using a notion of prior distribution with interval or fuzzy-valued parameters, we extend a concept of posterior probability of a fuzzy hypothesis. Some of its properties are also put into investigation. The feasibility and effectiveness of the proposed methods are also cla...

متن کامل

Bayesian point null hypothesis testing via the posterior likelihood ratio

Neyman-Pearson or frequentist inference and Bayes inference are most clearly differentiated by their approaches to point null hypothesis testing. With very large samples, the frequentist and Bayesian conclusions from a classical test of significance for a point null hypothesis can be contradictory, with a small frequentist P -value casting serious doubt on the null hypothesis, but a large Bayes...

متن کامل

Bayesian analysis in inverse problems

In this paper, we consider some statistical aspects of inverse problems. using Bayesian analysis, particularly estimation and hypothesis-testing questions for parameterdependent differential equations. We relate Bayesian maximum likelihood Io Tikhonw regularization. and we apply the expectatian-minimization (F-M) algorithm to the problem of setting regularization levels. Further, we compare Bay...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998